Classifying Idiomatic and Literal Expressions Using Vector Space Representations
نویسندگان
چکیده
We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that idioms and literal expressions occur in different contexts. Idioms tend to violate cohesive ties in local contexts, while literals are expected to fit in. Our goal is to capture this intuition using a vector representation of words. We propose two approaches: (1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. We provide experimental results validating the proposed techniques.
منابع مشابه
Experiments in Idiom Recognition
Some expressions can be ambiguous between idiomatic and literal interpretations depending on the context they occur in, e.g., sales hit the roof vs. hit the roof of the car. We present a novel method of classifying whether a given instance is literal or idiomatic, focusing on verb-noun constructions. We report state-of-the-art results on this task using an approach based on the hypothesis that ...
متن کاملClassifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions
We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective and ...
متن کاملClassifier Combination for Contextual Idiom Detection Without Labelled Data
We propose a novel unsupervised approach for distinguishing literal and non-literal use of idiomatic expressions. Our model combines an unsupervised and a supervised classifier. The former bases its decision on the cohesive structure of the context and labels training data for the latter, which can then take a larger feature space into account. We show that a combination of both classifiers lea...
متن کاملIn God We Trust. All Others Must Bring Data. - W. Edwards Deming. Using Word Embeddings to Recognize Idioms
Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Com...
متن کاملAutomatic Idiom Recognition with Word Embeddings
Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms and their literal counterparts do not appear in the same contexts. We propose two approaches: (1) Compute in...
متن کامل